Sampling-based Alignment and Hierarchical Sub-sentential Alignment in Chinese-Japanese Translation of Patents

نویسندگان

  • Wei Yang
  • Zhongwen Zhao
  • Baosong Yang
  • Yves Lepage
چکیده

This paper describes Chinese–Japanese translation systems based on different alignment methods using the JPO corpus and our submission (ID: WASUIPS) to the subtask of the 2015 Workshop on Asian Translation. One of the alignment methods used is bilingual hierarchical sub-sentential alignment combined with sampling-based multilingual alignment. We also accelerated this method and in this paper, we evaluate the translation results and time spent on several machine translation tasks. The training time is much faster than the standard baseline pipeline (GIZA++/Moses) and MGIZA/Moses.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining fast_align with Hierarchical Sub-sentential Alignment for Better Word Alignments

fast align is a simple and fast word alignment tool which is widely used in state-of-the-art machine translation systems. It yields comparable results in the end-to-end translation experiments of various language pairs. However, fast align does not perform as well as GIZA++ when applied to language pairs with distinct word orders, like English and Japanese. In this paper, given the lexical tran...

متن کامل

Using Punctuations and Lengths for Bilingual Sub-sentential Alignment

We present a new approach to aligning bilingual English and Chinese text at sub-sentential level by interleaving alphabetic texts and punctuations matches. With sub-sentential alignment, we expect to improve the effectiveness of alignment at word, chunk and phrase levels and provide finer grained and more reusable translation memory.

متن کامل

Interleaving Text and Punctuations for Bilingual Sub-sentential Alignment

We present a new approach to aligning bilingual English and Chinese text at sub-sentential level by interleaving alphabetic texts and punctuations matches. With sub-sentential alignment, we expect to improve the effectiveness of alignment at word, chunk and phrase levels and provide finer grained and more reusable translation memory.

متن کامل

Fast BTG-Forest-Based Hierarchical Sub-sentential Alignment

In this paper, we propose a novel BTGforest-based alignment method. Based on a fast unsupervised initialization of parameters using variational IBM models, we synchronously parse parallel sentences top-down and align hierarchically under the constraint of BTG. Our twostep method can achieve the same run-time and comparable translation performance as fast align while it yields smaller phrase tab...

متن کامل

Hierarchical Sub-sentential Alignment with Anymalign

We present a sub-sentential alignment algorithm that relies on association scores between words or phrases. This algorithm is inspired by previous work on alignment by recursive binary segmentation and on document clustering. We evaluate the resulting alignments on machine translation tasks and show that we can obtain state-ofthe-art results, with gains up to more than 4 BLEU points compared to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015